This Page Is Inserted by IFW Operations 
and is not a part of the Official Record 

BEST AVAILABLE IMAGES 



Defective images within this document are accurate representations of 
the original documents submitted by the applicant. 

Defects in the images may include (but are not limited to): 



BLACK BORDERS 

TEXT CUT OFF AT TOP, BOTTOM OR SIDES 
FADED TEXT 
ILLEGIBLE TEXT 
SKEWED/SLANTED IMAGES 
COLORED PHOTOS 

BLACK OR VERY BLACK AND WHITE DARK PHOTOS 
GRAY SCALE DOCUMENTS 



IMAGES ARE BEST AVAILABLE COPY. 



As rescanning documents will not correct images, 
please do not report the images to the 
Image Problem Mailbox. 



THIS PAGE BLANK 



(USPTO) 



® 



Europaisches Patentamt 
European Patent Office 
Office europeen des brevets 



(2) Publication number: 



0 574 937 A2 



® 



EUROPEAN PATENT APPLICATION 



@ Application nunnber: 93109763.8 
@ Date of filing: ia06,93 



(£) Int. CI 5: G06K 9/66 



@ Priority: 19.06.92 US 901123 

@ Date of publication of application: 
22.12.93 Bulletin 93/51 

@ Designated Contracting States: 

AT BE OH DE DK ES PR GB GR IE IT LI LU MO 
NL PT SE 

0 Applicant: UNITED PARCEL SERVICE OF 
AMERICA, INC. 
400 Perimeter Center, 
Terraces North 
Atlanta, GA 30346(US) 



@ Inventor: Moed,Michael C. 
71 Aiken Street,ApptJ9, 
Norwallc,Oonnecticut 06851 (US) 
Inventor: Lee.Chih-Ping 
124 Coalpit Hill Road,&NUM;34 
Danbury,Connecticut 0681 0(US) 

0 Representative: Patentanwalte Beetz ■ Timpe ■ 
Siegfried Schmltt-Fumlan - Mayr 
Stelnsdorfstrasse 10 
D-80538 Munchen (DE) 



0 Method and apparatus for Input classification using a neural network. 



CM 
< 

CO 

a> 

rs 
in 



CL 
LU 



© The present invention is a classification method and apparatus for classifying an Input into one of a plurality 
of possible outputs. The invention generates a feature vector representative of the Input. The invention then 
calculates a distance measure from the feature vector to the center of each neuron of a plurality of neurons, 
where each neuron is associated with one of the possible outputs. The invention then selects each neuron that 
encompasses the feature vector in accordance with the distance measure. The invention then determines a vote 
for each possible output, where the vote is the number of selected neurons that are associated with each 
possible output. If the vote for one of the possible outputs Is greater than all other votes for all other possible 
outputs, then the invention selects that possible output as corresponding to the input. Otheoft^ise, if the vote for 
one of the possible outputs is not greater than all other votes for all other possible outputs, then the invention 
identifies the neuron that has the smallest distance measure of all other neurons. If that smallest distance 
measure is less than a specified value, then the invention selects the possible output associated with that 
identified neuron as corresponding to the input. 

The present invention is also a training method and apparatus for generating a neuron. The invention selects 
a plurality of training inputs, where each training input corresponds to a first possible output. The invention then 
characterizes the quality of each of the training inputs. The invention then selects from the characterized training 
inputs a training input that is of higher quality than at least one of the other training Inputs. The invention then 
creates a neuron in accordance with the selected characterized training input. 

The present invention is also a classification method for classifying an input into one of a plurality of 
possible outputs. The invention classifies the input into a cluster representative of two or more possible outputs. 
The inventions then classifies the input into one of the two or more possible outputs represented by the cluster. 
At least one of the classification steps of the invention is characterized by comparing Information representative 
of the input to a neuron. 

The invention is also a method and apparatus for adjusting a neuron encompassing a plurality of feature 
vectors. The invention characterizes the spatial distribution of the feature vectors. The invention then spatially 
adjusts the neuron In accordance with that characterization. 
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BACKGROUND OF THE INVENTION 

1 . Field of the Invention 

5 The present invention relates to classification methods and systems, and, in particular, to methods and 
systems for classifying optically acquired character images and to methods and systems for training such. 

2. Statement of Related Art 

10 In the field of package shipping, packages are routed from origins to destinations throughout the world 
according to destination addresses typed on shipping labels applied to these packages. In order to route 
packages, it is desirable to use automated optical character classification systems that can read those 
addresses. Such a classification system must be able to classify characters as quickly as possible. 
Conventional optical character classification systems using spherical neurons, such as those disclosed in 

75 U.S. Patent No. 4,326.259 (Cooper et al.). may be unable to execute the processing requirements presented 
by certain applications without a substantial investment in hardware. 

SUMMARY OF THE INVENTION 

20 The present invention is a classification method and apparatus for classifying an input into one of a 
plurality of possible outputs. The invention generates a feature vector representative of the input. The 
invention then calculates a distance measure from the feature vector to the center of each neuron of a 
plurality of neurons, where each neuron is associated with one of the possible outputs. The invention then 
selects each neuron that encompasses the feature vector in accordance with the distance measure. The 

25 invention then determines a vote for each possible output, where the vote is the number of selected 
neurons that are associated with each possible output. If the vote for one of the possible outputs is greater 
than all other votes for all other possible outputs, then the invention selects that passible output as 
corresponding to the input. Othenwise, if the vote for one of the possible outputs is not greater than all other 
votes for all other possible outputs, then the invention identifies the neuron that has the smallest distance 

30 measure of all other neurons. If that smallest distance measure is less than a specified value, then the 
invention selects the possible output associated with that identified neuron as corresponding to the input. 

The present invention is also a training method and apparatus for generating a neuron. The invention 
selects a plurality of training inputs, where each training input corresponds to a first possible output. The 
invention then characterizes the quality of each of the training inputs. The invention then selects from the 

35 characterized training inputs a training input that Is of higher quality than at least one of the other training 
inputs. The invention then creates a neuron in accordance with the selected characterized training input. 

The present invention is also a classification method for classifying an input into one of a plurality of 
possible outputs. The invention classifies the input into a cluster representative of two or more possible 
outputs. The inventions then classifies the input into one of the two or more possible outputs represented 

40 by the cluster. At least one of the classification steps of the invention is characterized by comparing 
information representative of the input to a neuron. 

The invention is also a method and apparatus for adjusting a neuron encompassing a plurality of feature 
vectors. The invention characterizes the spatial distribution of the feature vectors. The invention then 
spatially adjusts the neuron in accordance with that characterization. 

45 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figs. 1(a). 1(b), 1(c), and 1(d) are bitmap representations of a nominal letter "0", a degraded letter "O", 
a nominal number "7", and a degraded letter "7", respectively; 
50 Fig. 2 is a graphical depiction of a 2-dimensional feature space populated with 8 elliptical neurons that 
may be employed by the classification system of the present invention to classify images of the letters 
A, B. and C; 

Fig. 3 is a process flow diagram for classifying inputs according to a preferred embodiment of the 
present invention; 

55 Fig. 4 is a schematic diagram of part of the classification system of Fig. 3; 

Fig. 5 is a process flow diagram for generating neurons used by the classification system of Fig. 3; and 
Fig. 6 is a schematic diagram of a classification system that uses cluster classifiers for classifying inputs 
according to a preferred embodiment of the present invention. 
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DETAILED DESCRIPTION OF THE INVENTION 

The present invention Includes a system for optical character recognition, but, more generally, the 
invention covers a classification system for classifying an input as one of a defined set of possible outputs. 

5 For example, where the input is an optically acquired image representing one of the 26 capita! letters of the 
English alphabet, the classification system of the present invention may be used to select as an output that 
capital letter that is associated with the input image. The classification system of the present invention is 
discussed below in connection with Figs. 1(a), 2, 3, and 4. 

The present invention also includes a system for "training" the classification system of the present 

10 invention. This training system is preferably operated off line prior to deployment of the classification 
system. In the character recognition example, the training system accepts input Images representative of 
known characters to "learn" about the set of possible outputs into which unknown images will eventually be 
classified. The training system of the present invention is discussed below in connection with Fig. 5. 

The present invention also includes a system for training the classification system of the present 

75 invention based on ordering the training inputs according to the relative quality of the training inputs. This 
system for training is discussed below in connection with Figs. 1(a), 1(b). 1(c), and 1(d). 

The present invention also includes a system for adjusting the locations and shapes of neurons 
generated during the training systems of the present invention. 

The present invention also includes a classification system employing a hierarchical network of top-level 

20 and lower level cluster classifiers. The top-level classifier classifies inputs into one of a plurality of output 
clusters, where each output cluster is associated with a subset of the set of possible outputs. A cluster 
classifier, associated with the output cluster identified by the top-level classifier, then classifies the input as 
corresponding to one of the possible outputs. This classification system is discussed below in connection 
with Fig. 1(a), 1(b). 1(c), and 1(d). 

25 The present invention also includes a neural system of classifying inputs that combines two sub- 
systems. One subsystem counts the number of neurons that encompass a feature vector representing a 
particular input for each of the possible outputs. If one of the possible outputs has more neurons 
encompassing the feature vector than any other possible output, then the system selects that possible 
output as corresponding to that input. Otherwise, the second subsystem finds the neuron that has the 

30 smallest value for a particular distance measure for that feature vector. If that value is less than a specified 
threshold then the system selects the output associated with that neuron as corresponding to the input. This 
neural system is discussed below in connection with Figs. 1 (a). 2, 3, and 4. 

CLASSIFICATION SYSTEM 

35 

Referring now to Fig. 1(a), there is shown a bitmap representation of a nominal letter "0". When the 
classification system of the present invention classifies optically acquired character images, each character 
image to be classified may be represented by an input bitmap, an (m x n) image array of binary values as 
shown in Fig. 1 (a). In a preferred embodiment, the classification system of the present invention generates a 
40 vector in a /r-dimensional feature space from information contained in each input bitmap. Each feature 
vector F has feature elements /j, where 0 ^ J ^ k-^. The dimension of the feature space, k, may be any 
integer greater than one. Each feature element /j is a real value corresponding to one of k features derived 
from the input bitmap. 

The k features may be derived from the input bitmap using conventional feature extraction functions, 
45 such as. for example, the Grid or Hadamard feature extraction function. The feature vector F represents a 
point in the /f-dimensional feature space. The feature elements f-^ are the components of feature vector F 
along the feature-space axes of the /f-dimensional feature space. For purposes of this specification, the term 
"feature vector" refers to a point in feature space. 

In a preferred embodiment, a discriminant analysis transform may be applied to Grid-based or 
50 Hadamard-based feature vectors to define the feature space. In this embodiment, the separation between 
possible outputs may be increased and the dimensionality of the feature vector may be reduced by 
performing this discriminant analysis in which only the most significant Eigenvectors from the discriminant 
transformation are retained. 

The classification system of the present invention compares a feature vector F, representing a particular 
55 input image, to a set of neurons in feature space, where each neuron is a closed /f-dimensional region or 
"hyper-volume" in the /f-dimensional feature space. For example, when (k = 2), each neuron is an area in a 
2-dimensional feature space, and when (k = 3), each neuron is a volume in a 3-dimensional feature space. 
Fig. 2 shows a graphical depiction of an exemplary 2-dimensional feature space populated with eight 2- 
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dimensional neurons. 

In a preferred classification system according to the present invention, the boundary of at least one of 
the neurons populating a /c-dimensional feature space is defined by at least two axes that have different 
lengths. Some of these neurons may be generally represented mathematically as: 
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where Cj define the center point of the neuron, are the lengths of the neuron axes, and m and A are 
positive real constants. In a preferred embodiment, at least two of the neuron axis are of different length. 
75 The values g] that satisfy Equation (1) define the points in feature space that lie within or on the boundary of 
the neuron. Those skilled in the art will understand that other neurons within the scope of this invention may 
be represented by other mathematical expressions. For example, a neuron may be defined by the 
expression: 



20 



k-i, 

MAKi 
j = 0 



25 



< 1» 



(2) 



where the function "MAX" computes the maximum value of the ratio as J runs from 0 to k- 1. Neurons 
defined by Equation (2) are hyper-rectangles. 
30 In a preferred embodiment of the present invention, the neurons are hyper-ellipses in the /c-dimenslonal 
feature space. A hyper-ellipse is any hyper-volume defined by Equation (1), where (m = 2) and (A= 1). More 
particularly, a hyper-ellipse Is defined by the function: 



j=0 (b^)2 - 

40 

where q define the hyper-ellipse center point, f)j are the hyper-ellipse axis lengths, and the values gj that 
satisfy Equation (3) define the points that lie within or on the hyper-ellipse boundary. When all of the axes 
are the same length, the hyper-ellipse is a hyper-sphere. In a preferred embodiment of the present 

45 invention, in at least one of the neurons, at least two of the axes are of different length. By way of example, 
there is shown in Fig. 2 elliptical neuron 1, having center point {Co\c^^), and axis bo\ b^ ^ of different length. 
In a preferred embodiment, the axes of the neurons are aligned with the coordinate axes of the feature 
space. Those skilled in the art will understand that other neurons having axes that do not all align with the 
feature-space axes are within the scope of the invention. 

50 According to the present invention, each neuron is associated with a particular possible output. For 
example, each neuron may correspond to one of the 26 capital letters. Each neuron is associated with only 
one of the possible outputs (e.g., letters), but each possible output may have one or more associated 
neurons. Furthermore, neurons may overlap one another in feature space. For example, as shown in Fig. 2, 
neurons 0, 1, and 7 correspond to the character "A", neurons 2, 3, 5, and 6 correspond to the character 

55 "B", and neuron 4 corresponds to the character "C". Neurons 1 and 7 overlap, as do neurons 2. 3, and 6 
and neurons 3, 5, and 6. In an alternative embodiment (not shown), neurons corresponding to different 
possible outputs may overlap. The classification system of the present invention may employ the neurons 
of Fig. 2 to classify input images representative of the letters A, B. and C. 
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Referring now to Fig. 3, there is shown a process flow diagram of classification system 300 for 
classifying an input (e.g., a bitmap of an optically acquired character image) as one of a set of possible 
outputs (e.g., characters) according to a preferred embodiment of the present invention. In the preferred 
embodiment shown in Fig. 3, the neurons in classification system 300 are processed in parallel. In an 

5 alternative embodiment (not shown), the neurons of classification system 300 may be processed in series. 
Means 302 is provided for receiving an input image bitmap and generating a feature vector that represents 
information contained in that bitmap. Means 304 and 306 are provided for comparing the feature vector 
generated by means 302 to a set of neurons, at least one of which has two or more axes of different length. 
Classification system 300 selects one of the possible outputs based upon that comparison. 

10 In a preferred embodiment of the present invention, classification system 300 classifies optically 
acquired character bitmaps using a network of hyper-elliptical neurons. Means 302 of classification system 
300 receives as input the bitmap of an optically acquired character image to be classified and generates a 
corresponding feature vector F. Means 304 then determines an "elliptical distance" Tx as a function of the 
center and axes of each of the Enum hyper-elliptical neurons x in the network and feature vector F, where: 

15 



20 



k-1 
j = 0 



(4) 



In Equation (4). c-^ and define the center point and axis lengths, respectively, of neuron x, where x runs 
from 0 to Enum-1. and /j are the elements of feature vector F. Those skilled in the art would recognize that 

25 distance measures different from that of Equation (4) may also be used. 

Means 306 determines which, if any, of the Enum neurons encompass feature vector F A neuron 
encompasses a feature vector - and may be referred to as an "encompassing neuron" -if the feature 
vector lies inside the boundary that defines the neuron in feature space. For hyper-ellipses, neuron x 
encompasses feature vector F, if (r^ < 1 ). If (^x = 1 ). feature vector F lies on the boundary of neuron x, and 

30 if (fx > 1 ). feature vector F lies outside neuron x. Since neurons may overlap in feature space, a particular 
feature vector may be encompassed by more than one neuron. In Fig. 2, feature vector Fg, corresponding 
to a particular Input image, is encompassed by neurons 2 and 6. Alternatively, a feature vector may lie 
inside no neurons, as in the case of feature vector Fh of Fig. 2, which corresponds to a different input 
image. 

35 Means 308 finds the "closest" neuron for each possible output. As described earlier, each neuron is 
associated with one and only one possible output, but each possible output may have one or more neurons 
associated with it. Means 308 analyzes all of the neurons associated with each possible output and 
determines the neuron "closest" to feature vector F for that output. The "closest" neuron will be the one 
having the smallest "distance" measure value Tx In the example of feature vector Fg of Fig. 2, means 308 

40 will select neuron 1 as being the "closest" neuron to feature vector Fg for the character "A". It will also 
select neuron 2 as the "closest" neuron for character "B" and neuron 4 for character "C". 

Means 310 in Fig. 3 counts votes for each possible output. In a first preferred embodiment, each neuron 
that encompasses feature vector F is treated by means 310 as a single "vote" for the output associated 
with that neuron. In an alternative preferred embodiment discussed in greater detail with respect to Equation 

46 (7) below, each neuron that encompasses feature vector F is treated by means 310 as representing a 
"weighted vote" for the output associated with that neuron, where the weight associated with any particular 
neuron is a function of the number of training Input feature vectors encompassed by that neuron. In a 
preferred embodiment, means 310 implements proportional voting, where the weighted vote for a particular 
neuron is equal the number of feature vectors encompassed by that neuron. For each possible output, 

50 means 310 tallies all the votes for all the neurons that encompass feature vector F. There are three potential 
types of voting outcomes: either (1) one output character receives more votes than any other output 
character, (2) two or more output characters tie for the most votes, or (3) all output characters receive no 
votes, indicating the situation where no neurons encompass feature vector F. In Fig. 2, feature vector Fg 
may result In the first type of voting outcome: character "B" may receive 2 votes corresponding to 

65 encompassing neurons 2 and 6, while characters "A" and "C" receive no votes. Feature vector Fh of Fig. 2 
results in the third type of voting outcome with each character receiving no votes. 

Means 312 determines if the first type of voting outcome resulted from the application of means 310 to 
feature vector F. If only one of the possible output characters received the most votes, then means 312 
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directs the processing of classification system 300 to means 314, which selects that output character as 
corresponding to the input character bitmap. Otherwise, processing continues to means 316. For feature 
vector Fg in Fig. 2. means 312 determines that character "B" has more votes than any other character and 
directs means 314 to select "B" as the character corresponding to feature vector Fg. For feature vector Fh 
5 in Fig. 2. means 312 determines that no single character received the most votes and directs processing to 
means 316. 

Means 316 acts as a tie-breaker for the second and third potential voting outcome in which no outright 
vote-leader exists, either because of a tie or because the feature vector lies inside no neurons. To break the 
tie, means 316 selects that neuron x which is "closest" in elliptical distance to feature vector F and 

10 compares Tx to a specified threshold value e*". If {r^ ^ 9'"), then means 318 selects the output character 
associated with neuron x as corresponding to the input character bitmap. Otherwise, the tie is not broken 
and classification system 300 selects no character for the input image. A "no-character-selected " result is 
one of the possible outputs from classification system 300. For example, if classification system 300 Is 
designed to recognize capital letters and the input Image corresponds to the number "7", a no-character- 

75 selected result is an appropriate output. 

Threshold value G"" may be any numt)er greater than 1 and is preferably about 1.25. As described 
earlier, when feature vector F Is inside neuron x, then (r^ < 1), and when feature vector F is outside neuron 
X, then (fx > 1). If the voting result from means 310 is a tie for the most non-zero votes, then means 316 will 
select the output character associated with the encompassing neuron having a center which is "closest" in 

20 elliptical "distance" feature vector F. Alternatively, if there are no encompassing neurons, means 316 may 
still classify the input bitmap as corresponding to the output character associated with the "closest" neuron 
X, If (rx ^ e"*). Using a threshold value 9"" of about 1 .25 establishes a region surrounding each neuron used 
by means 316 for tie-breaking. In Fig. 2, feature vector Fh will be classified as character "C" if the 
"distance" measure a is less than the threshold value G'"; otherwise, no character is selected. 

25 Referring now to Fig. 4, there is shown a schematic diagram of classification system 400 of the present 
invention for classifying inputs as corresponding to a set of s possible outputs. Classification system 400 
may perform part of the processing performed by classification system 300 of Fig. 3. Classification system 
400 accepts feature vector F, represented by feature elements (fo, f^ , . .. /k-i), and generates values qr* and 
that act as pointers and/or flags to indicate the possible output to be selected. Classification system 400 

30 Includes four subsystem levels: input level 402, processing level 404, output level 406, and postprocessing 
level 408. 

Input level 402 includes the set / of k input processing units /j, where J runs from 0 to k'^ . Each Input 
processing unit /j receives as input one and only one element of the feature vector F and broadcasts this 
value to processing level 404. Input level 402 functions as a set of pass-through, broadcasting elements. 

35 Processing level 404 includes the set E of Enum elliptical processing units ex, where x runs from 0 to 
Enurn-1 • Each elliptical processing unit Cx Is connected to and receives Input from the output of every Input 
processing unit /j of Input level 402. Elliptical processing unit ex implements Equation (4) for neuron x of 
classification system 300 of Fig. 3. Like neuron x of classification system 300, each elliptical processing unit 
ex is defined by two vectors of Internal parameters: and The elements of vector ff" are the lengths of 

40 the axes of neuron x, where: 



45 



and the elements of vector C** are the coordinates of the center point of neuron x, where: 



55 Each elliptical processing unit ex of processing level 404 computes the distance measure Tx from 
feature vector F to the center of neuron x. Processing level 404 Is associated with means 304 of 
classification system 300. If {r^ < 1), then elliptical processing unit ex is said to be activated; otherwise, 
elliptical processing unit ex is not activated. In other words, elliptical processing unit ex is activated when 



7 



EP 0 574 937 A2 



neuron x encompasses feature vector F. Each elliptical processing unit ex broadcasts the computed 

distance measure fx to only two output processing units of output level 406. 

Output level 406 includes two parts: output-total part 410 and output-minimize part 412. Output-total part 

410 contains the set of s output processing units On^ and output-minimize part 412 contains the set c/" of 
5 s output processing units On'", where n runs from 0 to s-1 , where s is also the number of possible outputs 

for which classification system 400 has been trained. For example, when classifying capital letters, s = 26. 

Each processing unit pair {On^On"") is associated with only one possible output and vice versa. 

Each elliptical processing unit ex of processing level 404 is connected to and provides output to only 

one output processing unit On* of output-total part 410 and to only one output processing unit On"" of output- 
10 minimize part 412. However, each output processing unit On' and each output processing unit On"* may be 

connected to and receive input from one or more elliptical processing units ex of processing level 404. 

These relationships are represented by connection matrices W and W^, both of which are of dimension (s 

X Enum)- In a preferred embodiment, If there is a connection between elliptical processing unit ex of 

processing level 400 and output processing unit On* of output-total part 410 of output level 406. an entry 
75 Wnx in connection matrix will have a value that is equal to the number of training input feature vectors 

encompassed by neuron x; otherwise, it has value 0. In a further preferred embodiment, entry iVnx* has a 

value 1 if there is a connection between elliptical processing unit ex and output processing unit On*. 

Connection matrix represents the connections between processing level 404 and output-minimize 

part 412 of output level 406 and is related to connection matrix W*. An entry Wnx"" in connection matrix MT 
20 will have a value of 1 for every entry iVnx* in connection matrix that is not zero. Otherwise, entry Wnx'" 

will have a value of 0. 

Each output processing unit On* in output-total part 410 computes an output value On*. where: 



25 E -1 

num 



t . (7) 



x=o 

30 

where the function T{Q returns the value 0 if (r^ > 1); otherwise, it returns the value 1. In other words, the 
function ^(rx) returns the value 1 if elliptical processing unit ex of processing level 404 is activated. Output 
processing unit On* counts the votes for the possible output with which it is associated and outputs the total. 
Output-total part 410 of output level 406 is associated with means 306 and means 310 of classification 
35 system 300. 

Similarly, each output processing unit On*" in output-minimize part 412 computes an output value On"', 
where: 



^0 ^nuin~l 



m , m . (8) 

o J" = x=0^ (wj^x^x)' 

f orallw"*- afcO 
nx 



45 



where the function "MIN" returns the minimum value of (fcVnx^'rx) over all the elliptical processing units ex. 
Therefore, each output processing unit On*" examines each of the elliptical processing units ex to which it is 
connected and outputs a real value equal to the minimum output value from these elliptical processing 
50 units. Output-minimize part 412 of output level 406 is associated with means 308 of classification system 
300. 

Postprocessing level 408 includes two postprocessing units p* and p"". Postprocessing unit p* is 
connected to and receives input from every output processing unit On* of output-total part 410 of output level 
406. Postprocessing unit p* finds the output processing unit On* that has the maximum output value and 
55 generates the value q*. If output processing unit On* of output-total part 410 has an output value greater than 
those of all the other output processing units of output-total part 410, then the value (f is set to n - the 
Index for that output processing unit. For example, when classifying capital letters n may be 0 for "A" and 
1 for "B", etc. Otherwise, the value (f is set to -1 to Indicate that output-total part 410 of output level 406 
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did not classify the input. Postprocessing unit p* of postprocessing level 408 is associated with means 312 
of classification system 300. 

Similarly, postprocessing unit - the other postprocessing unit in postprocessing level 408 - is 
connected to and receives input from every output processing unit On"™ of output-minimize part 412 of 

5 output level 406. Postprocessing unit ff^ finds the output processing unit On"" that has the minimum output 
value and generates the value cT- If output processing unit On"" of output-minimize part 412 has an output 
value less than a specified threshold 8"", then the value cT is set to the corresponding index n. Othen/vise, 
the value cT is set to -1 to indicate that output-minimize part 412 of output level 406 did not classify the 
input, because the feature vector F is outside the threshold region surrounding neuron x for all neurons x. 

10 The threshold e"" may be the sane threshold G"" used in classification system 300 of Fig. 3. Postprocessing 
unit of postprocessing level 408 is associated with means 316 of classification system 300. 

Classification of the input is completed by analyzing the values <f and q'". If (CT^-^-l), then the input is 
classified as possible output of the set of s possible outputs. If (Q^ = -1) and then the input is 

classified as possible output <f^ of the set of s possible outputs. Otherwise, if both values are -1 , then the 

75 input is not classified as any of the s possible outputs. 

TRAINING SYSTEM 

A neural network must be trained before it may be used to classify inputs. The training system of the 
20 present invention performs this required training by generating at least one non-spherical neuron in the k- 
dimensional feature space. The training system is preferably implemented off line prior to the development 
of a classification system. 

The training system of the present invention generates neurons based upon a set of training inputs, 
where each training input is known to correspond to one of the possible outputs in the classification set. 
25 Continuing with the example of capital letters used to describe classification system 300, each training input 
way be a bitmap corresponding to one of the characters from "A" to "Z". Each character must be 
represented by at least one training input, although typically 250 to 750 training inputs are used for each 
character. 

Referring now to Fig. 5, there is shown a process flow diagram of training system 500 for generating 

30 neurons in /r-dimensional feature space that may be used in classification system 300 of Fig. 3 or in 
classification system 400 of Fig. 4. For example, when training for output classification, training system 500 
sequentially processes a set of training bitmap inputs corresponding to known outputs. At a particular point 
in the training, there will be a set of existing feature vectors that correspond to the training inputs previously 
processed and a set of existing neurons that have been generated from those existing feature vectors. For 

35 each training input, training system 500 generates a feature vector in a feature space that represents 
information contained in that training input. 

Training system 500 applies two rules in processing each training input. The first training rule is that if 
the feature vector, corresponding to the training input currently being processed, is encompassed by any 
existing neurons that are associated with a different known output, then the boundaries of those existing 

40 neurons are spatially adjusted to exclude that feature vector ~ that is, to ensure that that feature vector is 
not inside the boundary of those existing neurons. Otherwise, neurons are not spatially adjusted. For 
example, if the current training input corresponds to the character "R" and the feature vector corresponding 
to that training input is encompassed by two existing "P" neurons and one existing "B" neuron, then the 
boundaries of those three existing neurons are spatially adjusted to ensure they do not encompass the 

45 current feature vector. 

The second training rule is that if the current feature vector is not encompassed by at least one existing 
neuron that is associated with the same known output, then a new neuron is created. Otherwise, no new 
neuron is created for the current feature vector. For example, if the current training input corresponds to the 
character "W" and the feature vector corresponding to that training input is not encompassed by any 

50 existing neuron that is associated with the character "W", then a new "W" neuron is created to encompass 
that current feature vector. In a preferred embodiment, a new neuron is created by generating a temporary 
hyper-spherical neuron and then spatially adjusting that temporary neuron to create the new neuron. In an 
alternative preferred embodiment, the temporary neuron may be a nor-spherical hyper-ellipse. 

In a preferred embodiment of the present invention, training system 500 generates hyper-elliptical 

55 neurons from a set of training bitmap inputs corresponding to known characters. Training system 500 starts 
with no existing feature vectors and no existing neurons. Processing of training system 500 begins with 
means 502 which selects as the current training input a first training input from a set of training inputs. 
Means 504 generates the feature vector F that corresponds to the current training input. 
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When the first training input is the current training input, there are no existing neurons and therefore no 
existing neurons that encompass feature vector F. In that case, processing of training system 500 flows to 
means 514 which creates a new neuron centered on feature vector F. The new neuron in preferably defined 
by Equation (3), where all the new neuron axes are set to the same length, that is, = X) for all /. Since the 
5 new neuron axes are all the same length, the new neuron is a hyper-sphere in feature space of radius X. In 
a preferred embodiment, the value of constant X may be twice as large as the largest feature element of 
all the feature vectors F for the entire set of training inputs. Since there are no existing feature vectors when 
processing the first training input, training system 500 next flows to means 528 from which point the 
processing of training system 500 may be described more generally. 
10 Means 528 determines whether the current training input is the last training input in the set of training 
inputs. If not, then means 528 directs processing of training system 500 to means 530 which selects the 
next training input as the current training input. Means 504 then generates the feature vector F correspond- 
ing to the current training input. 

Means 506 and 508 determine which, if any, existing neurons are to be spatially adjusted to avoid 
15 encompassing feature vector F. In a preferred embodiment, means 510 adjusts an existing neuron if that 
neuron is not associated with the same known character as the current training input (as determined by 
means 506) and if it encompasses feature vector F (as determined by means 508). Means 508 determines 
if an existing neuron encompasses feature vector F by calculating and testing the "distance" measure of 
Equation (4) and testing whether (rx < 1) as described earlier. 
20 In a preferred embodiment, means 510 spatially adjusts an existing neuron by optimally shrinking it 
along only one axis. In another preferred embodiment, means 510 shrinks an existing neuron proportionally 
along one or more axes. These shrinking methods are explained in greater detail later in this specification. 
After processing by means 510, the current feature vector is not encompassed by any existing neurons that 
are associated with a character which is different from the character associated with the training input. 
25 Hence, the current feature vector lies either outside or on the boundaries of such existing neurons. 

Training system 500 also determines if a new neuron is to be created and, if so, creates that new 
neuron. A new neuron is created (by means 514) if the feature vector F is not encompassed by any existing 
neuron associated with the same character as the training input (as determined by means 512). As 
described above, means 514 creates a new neuron that is, preferably, a hyper-sphere of radius X. 
30 Training system 500 then tests and, if necessary, spatially adjusts each new neuron created by means 
514 to ensure that it does not encompass any existing feature vectors that are associated with a character 
which is different from the character associated with the training input. Means 516, 524, and 526 control the 
sequence of testing a new neuron against each of the existing feature vectors by selecting one of the 
existing feature vectors at a time. If a new neuron is associated with a character different from that of the 
35 currently selected existing feature vector (as determined by means 518) and if the new neuron encom- 
passes that selected existing feature vector (as determined by means 520 using Equation (4)), then means 
524 spatially adjusts the new neuron by one of the same shrinking algorithms employed by means 510. 
Training system 500 continues to test and adjust a new neuron until all existing feature vectors have been 
processed. Since the hyper-spherical neuron created by means 514 is adjusted by means 522, that hyper- 
40 spherical neuron is a temporary neuron with temporary neuron axes of equal length. Processing of training 
system 500 then continues to means 528 to control the selection of the next training input. 

In a preferred embodiment, the steps of (1) shrinking existing neurons for a given input, and (2) creating 
and shrinking a new neuron created for that same input may be performed in parallel. Those skilled in the 
art will understand that these two steps may also be performed sequentially in either order. 
45 In a preferred embodiment, after all of the training inputs in the set of training inputs have been 
processed sequentially, means 528 directs processing of training system 500 to means 532. After 
processing a set of training inputs with their corresponding feature vectors, feature space is populated with 
both feature vectors and neurons. After processing the set of training inputs one time, some feature vectors 
may not be encompassed by any neurons. This occurs when feature vectors, that were, at some point in 
60 the training process, encompassed by neuron(s) of the same character, become excluded from those 
neurons when those neurons were shrunk to avoid subsequent feature vectors associated with a different 
character. In such a situation, means 532 directs processing to return to means 502 to repeat processing of 
the entire set of training inputs. When repeating this processing, the previously created neurons are 
retained. By iteratively repeating this training process, new neurons are created with each iteration until 
55 eventually each and every feature vector is encompassed by one or more neurons that are associated with 
the proper output and no feature vectors are encompassed by neurons associated with different possible 
outputs. Moreover, this iterative training is guaranteed to converge in a finite period of time with the 
maximum number of iterations being equal to the total number of training inputs. 
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After training system 500 connpletes Its processing, the feature space Is populated witli neurons that 
may then be used by characterization system 300 or characterization system 400 to classify an unknown 
input into one of a plurality of possible outputs. 

6 OPTIMAL ONE-AXIS SHRINKING 

As mentioned earlier, in a preferred embodiment, training system 500 spatially adjusts the boundary of 
a hyper-elliptical neuron to exclude a particular feature vector by optimally shrinking along one axis. Means 
510 and 522 of training system 500 may perform this one-axis shrinking by (1) identifying the axis to shrink, 
10 and (2) calculating the new length for that axis. 

Training system 500 identifies the axis n to shrink by the formula: 



75 



20 



j7 = aigmax 
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25 where the function "argmax" returns the value of / that maximizes the expression in the square brackets for 
any / from 0 to k-^ ; Cj and define the center point and axis lengths, respectively, of the neuron to be 
adjusted; and define the feature vector to be excluded by that neuron. 

Training system 500 then calculates the new length b„' for axis n by the equation: 



30 



35 



N 



k-l 



(10) 



J=0 



In one-axis shrinking, all other axes retain their original lengths 
40 One-axis shrinking of an original hyper-elliptical neuron according to Equations (9) and (10) results in an 
adjusted neuron with the greatest hyper-volume Uthat satisfies the following four criteria: 

(1) The adjusted neuron is a hyper-ellipse; 

(2) The center point of the original neuron is the same as the center point of the adjusted neuron; 

(3) The feature vector to be excluded lies on the boundary of the adjusted neuron; and 

45 (4) All points within or on the boundary of the adjusted neuron lie within or on the boundary of the 
original neuron. 
The hyper-volume V is defined by: 



50 



;c-i 



(11) 



where is a constant that depends on the value of /f, where k is the dimension of the feature space, and b^ 
are the lengths of the axes defining the adjusted neuron. One-axis shrinking, therefore, provides a first 
method for optimally adjusting neurons according to the present invention. 
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PROPORTIONAL SHRINKING ALGORITHM 

In alternative preferred embodiment, training system 500 spatially adjusts the boundary of a hyper- 
elliptical neuron to exclude a particular feature vector by shrinking proportionally along one or more axes. 
5 Means 510 and 522 of training system 500 may perform proportional shrinking by calculating the vector AB 
of axis length changes Ai)j, where: 
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(12) 



where: 

75 AB =(Ado, Ai)i, AZ)k-i), (13) 
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35 and 

r = (70,71 7k-l). (21) 

where i fo-Co i is the absolute value of (/b-Cb); I IF-Ci i is the magnitude of the vector difference between F 
40 and C; q and 6j define the center point and axis lengths, respectively, of the neuron to be adjusted: f\ are 
the elements of the feature vector to be excluded from that neuron; and a and 7j may be constants. The 
new axis lengths b{ for the adjusted neuron are calculated by: 



45 



(22) 



50 for y from 0 to /f-1. 

In proportional shrinking, training system 500 determines the projections of a vector onto the axes of the 
neuron to be adjusted, where the vector points from the center of that neuron to the feature vector to be 
excluded. These projections are represented by the vector of cosines of Equation (14). Training system 500 
then determines how much to shrink each neuron axis based on the relationship between the length of the 
55 axis and the length of the projection onto that axis. 

In a preferred embodiment, the constant a in Equation (1 2) is selected to be less than 1 . In this case, 
training system 500 may perform iterative shrinking, where the neuron is slowly adjusted over multiple axis- 
shrinking steps until it Is determined that the feature vector to be excluded is outside the adjusted neuron. 
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In a preferred embodiment, parameter 7] may be set to a positive value that is roughly 0.001 times the size 
of axis j to ensure that proportional shrinking eventually places the feature vector outside the neuron. In an 
alternative preferred embodiment, the parameters may be error functions based on the distance from the 
feature vector to the boundary of the slowly adjusted neuron. In such case, training system 500 may 
5 operate as a proportional integral controller for adjusting neurons. 

ORDERING OF TRAINING INPUTS 

In a preferred embodiment of the present invention, the set of training inputs, used sequentially by the 
10 training system to generate neurons, may be organized according to input quality. The training inputs may 
be ordered to train with higher quality inputs before proceeding to those of lower quality. This quality 
ordering of training inputs ensures that neurons are centered about feature vectors that correspond to inputs 
of higher quality. Such ordered training may improve the performance efficiency of a classification system 
by reducing the numbers of neurons needed to define the classification system. Such ordering may also 
75 reduce the numbers of misclassifications and non-classifications made by the classification system. A 
misclassification is when a classification system selects one possible output when, in truth, the input 
corresponds to a different possible output. A non-classification is when a classification system fails to select 
one of the known outputs and instead outputs a no-output-selected result. 

Referring now to Figs. 1(a), 1(b), 1(c), and 1(d), there are shown bitmap representations of a nominal 
20 letter "0", a degraded letter "O", a nominal number "7", and a degraded letter "7", respectively. A nominal 
Input Is an ideal input with no noise associated with it. A degraded input Is one in which noise has created 
deviations from the nominal input. Degraded inputs may result from either controlled noise or real 
unpredictable noise. 

In a preferred embodiment, the training system of the present invention may train with training inputs of 

25 three different quality levels. The first level of training inputs are nominal Inputs like those presented In Figs. 
1(a) and 1(c). The second level of training inputs are controlled nojse inputs, a type of degraded Input 
created by applying defined noise functions or signals with different characteristics, either independantly or 
In combination, to nominal inputs. The third level of training inputs are real noise inputs, a second type of 
degraded inputs which, in the case of characters, may be optically acquired images of known characters. 

30 Such degraded inputs have real unpredictable noise. Figs. 1(b) and 1(d) present representations of possible 
controlled noise inputs and real noise inputs. In a preferred embodiment, the nominal inputs have the 
highest quality, with the controlled noise inputs and real noise inputs of decreasing lesser quality. 
Depending upon the controlled noise functions and signals applied, a particular controlled-noise input may 
be of greater or lessor quality than a particular real-noise input. 

35 The quality of a particular degraded input - of either controlled-noise or real-noise variety ~ may be 
determined by comparing the degraded input to a nominal input corresponding to the same known 
character. In a preferred embodiment, a quality measure may be based on the number of pixels that differ 
between the two Inputs. In another preferred embodiment, the quality measure may be based on 
conventional feature measures such as Grid or Hadamard features. 

40 In a preferred embodiment, training systems of the present invention train first with the nominal Inputs 
and then later with degraded controlled-noise and real-noise inputs. In this preferred embodiment, training 
with Inputs corresponding to Figs. 1(a) and 1(c) would proceed training with those of Figs. 1(b) and 1(d). In 
another preferred embodiment, the training system trains with all Inputs of the same known character prior 
to proceeding to the next known character, and the training inputs of each known character are Internally 

45 organized by quality. In this preferred embodiment, training with Fig. 1(a) proceeds that with Fig. 1(b), and 
training with Fig. 1(c) proceeds that with Fig. 1(d). Those skilled in the art will understand that the exact 
overall sequence of training with all of the inputs is of lessor Importance than ordering of inputs by quality 
for each different known character. 

50 REFINEMENT OF NEURONS 

After the training system of the present invention has completed training, the feature space is populated 
with neurons that encompass feature vectors, with one feature vector corresponding to each distinct training 
input. Each neuron may encompass one or more feature vectors -- the one at the center of the neuron that 
55 was used to create the neuron and possibly other feature vectors corresponding to inputs associated with 
the same known character. 

Depending upon the quality ordering of the training Inputs used in the sequential training, a particular 
neuron may encompass these feature vectors in a more or less efficient manner. For example, If the feature 
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vector used to create a particular neuron corresponds to a highly degraded input, then that feature vector 
will lie at the center of that neuron. That same neuron may also encompass other feature vectors 
corresponding to nominal Inputs and inputs of lessor degradation. Such a neuron may not be the most 
efficient neuron for encompassing that set of feature vectors. A classification system using such a neuron 

5 may make more misclassifications and non-classifications than one using a more efficient neuron. 

A refinement system of the present invention spatially adjusts neurons, created during training, to create 
more efficient neurons. This refinement system may characterize the spatial distribution of feature vectors 
encompassed by a particular neuron and then spatially adjust that neuron. Such spatial adjustment may 
involve translating the neuron from Its current center point toward the mean of the spatial distribution of 

10 those feature vectors. After translating the neuron, the axis lengths may be adjusted to ensure that feature 
vectors of the same output character are encompassed by the neuron and to ensure that feature vectors of 
different output character are excluded. 

In an alternative embodiment, the refinement system may spatially adjust two or more neurons of the 
same character to create one or more neurons that more efficiently encompass the same feature vectors, 

75 where a feature vector from one original neuron may be encompassed by a different more efficient neuron. 
For example, before refinement, a first neuron may encompass feature vectors Fi, F2, and F3, and a 
second neuron may encompass feature vectors F4, F5, Fg, and F7. After refinement, feature vectors Fi . F2, 
F3, and Fa may be encompassed by a third neuron, and feature vectors F5. Fg, and F7 may be 
encompassed by a fourth neuron, where the centers and axis lengths of the third and fourth neurons are all 

20 different from those of the first and second neurons. 

CLASSIFYING SYSTEMS WITH CLUSTER CLASSIFIERS 

In a first preferred embodiment of the present invention, a classification system classifies Inputs Into 

25 one of a set of possible outputs by comparing the feature vector, for each Input to be classified, with every 
neuron In the feature space. Such classification systems are presented In Figs. 3 and 4. 

Referring now to Fig. 6, there is shown classification system 600 - a second preferred embodiment of 
the present invention -- in which inputs are classified Into one of a set of possible outputs using neurons 
and cluster classifiers. Classification system 600 Includes top-level classifier 602 and two or more cluster 

30 classifiers 604, 606 608. Top-level classifier 602 classifies Inputs into appropriate clusters of inputs. For 

example, where classification system 600 classifies characters, top-level classifier 602 may classify input 
bitmaps corresponding to optically acquired characters Into clusters of characters. 

The characters clustered together may be those represented by similar bitmaps, or, In other words, 
those characters associated with feature vectors close to one another in feature space. For example, a first 

35 character cluster may correspond to the characters "D", "P", "R" and "B". A second character cluster may 
correspond to the characters "O". "C", "D", "U", and "Q". A third cluster may correspond to only one 
character such as the character "Z". A particular character may be In more than one character cluster. In 
this example, the character "D" Is in both the first and the second character clusters, because its bitmaps 
are similar to those of both clusters. 

40 In a preferred embodiment, before training, characters are clustered based on a confusion matrix. The 
confusion matrix represents the likelihood that one character will be confused with another character for 
every possible pair of characters. In general, the closer the feature vectors of one character are to those of 
another character, the higher the likelihood that those two characters may be confused. For example, the 
character "D" may have a higher confusion likelihood with respect to the "0" than to the "M", if the feature 

45 vectors for "D" are closer to the feature vectors for "O" than to those for "M". 

In a preferred embodiment, the clustering of characters is based upon a conventional K-Means 
Clustering Algorithm, In which a set of templates Is specified for each character, where each template is a 
point In feature space. The K-Means clustering Algorithm determines where In feature space to locate the 
templates for a particular character by analyzing the locations of the feature vectors for all of the training 

50 inputs corresponding to that character. Templates are preferably positioned near the arithmetic means of 
clusters of associated feature vectors. 

In a preferred embodiment, four templates may be used for each character and the number of 
characters per cluster may be roughly even. For example, when classifying the 64 characters corresponding 
to the 26 capital and 26 lower-case letters, the 10 digits, and the symbols and 4x64 or 256 

55 templates may be used to define 7 different clusters of roughly equivalent numbers of characters. 

By clustering characters, top-level classifier 602 may implement a classification algorithm that quickly 
and accurately determines the appropriate cluster for each input. In a preferred embodiment, top-level 
classifier 602 implements a neuron-based classification algorithm. In another preferred embodiment, other 
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conventional non-neural classification algorithnns may be performed by top-level classifier 602. Top-level 
classifier 602 selects the appropriate cluster for a particular input and directs processing to continue to the 

appropriate cluster classifier 604, 606 608. Each cluster classifier is associated with one and only one 

character cluster, and vice versa. 

In one preferred embodiment, each cluster classifier may Implement a classification algorithm unique to 
that character cluster, or shared by only a subset of the total number of character clusters. Each cluster 
classifier may therefore employ neurons that exist in a feature space unique to that character cluster. For 
example, training for the "P", "R", "B" cluster may employ a particular set of Grid features, while training 
for the "0". "C", "D", "U". "Q" cluster may employ a different set of Hadamard features. In that case, 
different training procedures are performed for each different cluster classifier, where only inputs cor- 
responding to those characters of the associated cluster are used for each different training procedure. 

In a third preferred embodiment of the present Invention, a classification system according to Fig. 6 
may classify inputs into one of a set of possible outputs using neurons and cluster classifiers. In this third 
embodiment, top-level classifier 602 Identifies the template in feature space closest to the feature vector for 
the current input to be classified. The identified template is associated with a particular character that 
belongs to one or more character clusters. The top-level classifier 602 directs processing to only those 
cluster classifiers 604, 606, .... 608 associated with the character clusters of the closest template. Since a 
particular character may be in more than one character cluster, more than one cluster classifier may be 
selected by top-level classifier 602 for processing. 

In a fourth preferred embodiment, each cluster classifier way have a decision tree that identifies those 
neurons that should be processed for a given input. Prior to classifying, feature vector space for a particular 
cluster classifier may be divided into regions according to the distribution of feature vectors and/or neurons 
in feature space. Each region contains one or more neurons, each neuron may belong to more than one 
region, and two or more regions may overlap. Top-level classifier 602 may determine in which feature- 
space region (or regions) the feature vector for the current input lies and may direct the selected cluster 
classifiers to process only those neurons associated with the region (or those regions). 

Those skilled In the art will understand that some classification systems of the present invention may 
use decision trees without cluster classifiers, some may use cluster classifiers without decision trees, some 
may use both, and others may use neither. Those skilled in the art will further understand that decision 
trees and cluster classifiers may increase the efficiency of classification systems of the present invention by 
reducing processing time, 

PREFERRED AND ALTERNATIVE PREFERRED EMBODIMENTS 

Those skilled In the art will understand that classifying systems of the present Invention may be 
arranged in series of parallel. For example. In a preferred embodiment, a first character classifier based on 
Grid features may be arranged in series with a second character classifier based on Hadamard features. In 
such case, the first classifier classifies a particular bitmap Input as one of the known characters or it fails to 
classify that input. If it falls to classify, then the second classifier attempts to classify that input. 

In an alternative embodiment, two or more different classifiers may be arranged In parallel. In such 
case, a voting scheme may be employed to select the appropriate output by comparing the outputs of each 
different classifier. 

In a preferred embodiment, classification systems and training systems of the present invention perform 
parallel processing, where each elliptical processing unit may run on a separate computer processor during 
classification, although those skilled in the art will understand that these systems may also perform serial 
processing. In a preferred embodiment, the classification systems and training systems may reside In a 
reduced instruction set computer (RISC) processor such as a SPARC 2 processor running on a SPARC- 
station 2 marketed by Sun Microsystems. 

Those skilled In the art will understand that inputs other than character images may be classified with 
the classification systems of the present invention. In general, any input may be classified as being one of a 
set of two or more possible outputs, where a no-selection result is one of the possible outputs. For example, 
the classification systems of the present invention may be used to identify persons based upon images of 
their faces, fingerprints, or even earlobes. Other classification systems of the present invention may be used 
to identify people from recordings of their voices. 

It will be further understood that various changes in the details, materials, and arrangements of the parts 
which have been described and illustrated in order to explain the nature of this invention may be made by 
those skilled in the art without departing from the principle and scope of the invention as expressed in the 
following claims. 
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Claims 

1. A classification method for classifying an input into one of a plurality of possible outputs, comprising 
the steps of: 

(a) generating a feature vector representative of said input; 

(b) calculating a distance measure from said feature vector to the center of each neuron of a plurality 
of neurons, wherein said each neuron is associated with one possible output of said plurality of 
possible outputs; 

(c) selecting each neuron of said plurality of neurons that encompasses said feature vector in 
accordance with said distance measure; 

(d) determining a vote for each possible output of said plurality of possible outputs, wherein said 
vote is a function of the number of said selected neurons that are associated with said each possible 
output; 

(e) if one said vote for a possible output is greater than all other said votes for all other possible 
outputs then selecting said possible output as corresponding to said input; 

(f) if one said vote for a possible output is not greater than all other said votes for all other possible 
outputs then identifying a neuron of said plurality of neurons that has the smallest distance measure 
of all other said neurons; and 

(g) if said smallest distance measure is less than a specified value, then selecting the possible 
output associated with said Identified neuron as corresponding to said input. 

2. The classification method of claim 1 , wherein step (d) comprises the step of determining a vote for 
each possible output of said plurality of possible outputs, wherein said vote is equal to the number of 
said selected neurons that are associated with said each possible output. 

a A training method for generating a neuron, comprising the steps of: 

(a) selecting a plurality of training inputs, wherein each training input corresponds to a first possible 
output; 

(b) characterizing the quality of each of said plurality of training inputs; 

(c) selecting from said characterized training inputs a training input that is of higher quality than at 
least one other training input of said characterized training inputs; and 

(d) creating said neuron in accordance with said selected characterized training input. 

4. The training method of claim 3, wherein said plurality of training inputs are representative of characters. 

5. The training method of claim 3, wherein said neuron comprises a boundary defined by two or more 
neuron axes of different length. 

6. The training method of claim 3, comprising the further steps of: 

(e) selecting at least one training input corresponding to a second possible output different from said 
first possible output; 

(f) spatially adjusting said neuron in accordance with said training input selected in step (e). 

7. The training method of claim 6. wherein said adjusted neuron comprises a boundary defined by two or 
more neuron axes of different length. 

8. The training method of claim 3, wherein said first training input represents a nominal input. 

9. The training method of claim 3, wherein said second training input represents a real input. 

10. The training method of claim 3, wherein said first training input has a higher signal-to-noise ratio than 
said second training input. 

11. A classification method for classifying an input into one of a plurality of possible outputs, comprising 
the steps of: 

(a) classifying said input into a cluster representative of two or more possible outputs; 

(b) classifying said input into one of said two or more possible outputs represented by said cluster; 
and 
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wherein at least one of step (a) and step (b) is characterized by comparing information representa- 
tive of said input to a neuron. 

12. The classification method of claim 11, wherein said neuron comprises a boundary defined by two or 
more neuron axes of different length. 

13. The classification method of claim 11, wherein said cluster is one of a plurality of clusters, wherein 
each of said clusters is representative of one or more possible outputs, wherein said neuron is one of a 
plurality of neurons in a feature space, wherein said feature space is divided into regions, wherein each 
of said regions has at least one neuron and each of said regions is associated with one of said clusters, 
and wherein step (b) comprises the step of classifying said input into one of said two or more possible 
outputs by comparing information representative of said input to one or more of said plurality of 
neurons that are associated with one of said clusters. 

14. A method for adjusting a neuron encompassing a plurality of feature vectors, comprising the steps of: 

(a) characterizing the spatial distribution of said feature vectors; and 

(b) spatially adjusting said neuron in accordance with the characterization of step (a). 

15. The method of claim 14, wherein said neuron comprises a boundary defined by two or more neuron 
axes of different length. 

16. The method of claim 14, wherein said neuron comprises a center point, and wherein step (b) comprises 
the step of translating said center point of said neuron toward the mean of said spatial distribution. 

17. The method of claim 16, wherein said translated neuron comprises a boundary defined by two or more 
neuron axes, wherein said translated neuron does not encompass all of said feature vectors, and 
wherein step (b) further comprises the step of changing the length of at least one of said neuron axes 
to encompass all of said feature vectors. 

18. A classification apparatus for classifying an input into one of a plurality of possible outputs, comprising: 

generating means for generating a feature vector representative of said input; 

calculating means for calculating a distance measure from said feature vector to the center of each 
neuron of a plurality of neurons, wherein said each neuron is associated with one possible output of 
said plurality of possible outputs; 

selecting means for selecting each neuron of said plurality of neurons that encompasses said 
feature vector in accordance with said distance measure; 

determining means for determining a vote for each possible output of said plurality of possible 
outputs, wherein said vote is a function of the number of said selected neurons that are associated with 
said each possible output; 

voting means for selecting a possible output as corresponding to said input, if said vote for said 
possible output is greater than all other said votes for all other possible outputs; 

identifying means for identifying a neuron of said plurality of neurons that has the smallest distance 
measure of all other said neurons, if one said vote for a possible output is not greater than all other said 
votes for all other possible outputs; and 

distance means for selecting the possible output associated with said identified neuron as 
corresponding to said input, if said smallest distance measure is less than a specified value. 

19. The classification apparatus of claim 18. wherein said determining means determines a vote for each 
possible output of said plurality of possible outputs, wherein said vote is equal to the number of said 
selected neurons that are associated with said each possible output. 

20. A training apparatus for generating a neuron, comprising: 

means for selecting a plurality of training inputs, wherein each training input corresponds to a first 
possible output; 

means for characterizing the quality of each of said plurality of training inputs; 
means for selecting from said characterized training inputs a training input that is of higher quality 
than at least one other training input of said characterized training inputs; and 

means for creating said neuron in accordance with said selected characterized training input. 
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21. The training apparatus of claim 20, wherein said plurality of training inputs are representative of 
characters. 

22. The training apparatus of claim 20, wherein said neuron comprises a boundary defined by two or more 
5 neuron axes of different length. 

2a The training apparatus of claim 20. further comprising; 

selecting means for selecting at least one training input corresponding to a second possible output 
different from said first possible output; and 
70 means for spatially adjusting said neuron in accordance with said training input selected by said 

selecting means. 

24. The training apparatus of claim 23, wherein said adjusted neuron comprises a boundary defined by two 
or more neuron axes of different length. 

75 

25. The training apparatus of claim 20, wherein said first training input represents a nominal input. 

26. The training apparatus of claim 20. wherein said second training input represents a real input. 

20 27. The training apparatus of claim 20. wherein said first training input has a higher signal-to-noise ratio 
than said second training input. 

2a A classification apparatus for classifying an input into one of a plurality of possible outputs, comprising: 
first classifying means for classifying said Input into a cluster representative of two or more 
25 possible outputs; and 

second classifying means for classifying said input into one of said two or more possible outputs 
represented by said cluster; and 

wherein at least one of said first classifying means and said second classifying means compares 
information representative of said input to a neuron. 

30 

29. The classification apparatus of claim 28, wherein said neuron comprises a boundary defined by two or 
more neuron axes of different length. 

30. The classification apparatus of claim 28, wherein said cluster is one of a plurality of clusters, wherein 
35 each of said clusters is representative of one or more possible outputs, wherein said neuron is one of a 

plurality of neurons in a feature space, wherein said feature space is divided into regions, wherein each 
of said regions has at least one neuron and each of said regions is associated with one of said clusters, 
and wherein said second classifying means classifies said input into one of said two or more possible 
outputs by comparing information representative of said input to one or more of said plurality of 
40 neurons that are associated with one of said clusters. 

31. A apparatus for adjusting a neuron encompassing a plurality of feature vectors, comprising: 

characterizing means for characterizing the spatial distribution of said feature vectors; and 
adjusting means for spatially adjusting said neuron in accordance with the characterization by said 
45 characterizing means. 

32. The apparatus of claim 31 , wherein said neuron comprises a boundary defined by two or more neuron 
axes of different length. 

50 3a The apparatus of claim 31 , wherein said neuron comprises a center point, and wherein said adjusting 
means translates said center point of said neuron toward the mean of said spatial distribution. 

34. The apparatus of claim 33, wherein said translated neuron comprises a boundary defined by two or 
more neuron axes, wherein said translated neuron does not encompass all of said feature vectors, and 
55 wherein said adjusting means changes the length of at least one of said neuron axes to encompass all 
of said feature vectors. 
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